Learning from Video
EgoNight: Towards Egocentric Vision Understanding at Night with a Challenging Benchmark
Moto: Latent Motion Token as the Bridging Language for Learning Robot Manipulation from Videos
HumanoidExo: Scalable Whole-Body Humanoid Manipulation via Wearable Exoskeleton
video mimic
Scaling Egocentric Vision: The EPIC-KITCHENS Dataset
Challenges and Trends in Egocentric Vision: A Survey
Perceiving and Acting in First-Person: A Dataset and Benchmark for Egocentric Human-Object-Human Interactions
One-Shot Imitation from Observing Humans via Domain-Adaptive Meta-Learning
EgoMimic: Scaling Imitation Learning via Egocentric Video
Automatic Generation of Two-Level Hierarchical Tutorials from Instructional Makeup Videos
ShowHowTo: Generating Scene-Conditioned Step-by-Step Visual Instructions
Aligning Step-by-Step Instructional Diagrams to Video Demonstrations
HowTo100M: Learning a Text-Video Embedding by Watching Hundred Million Narrated Video Clips
Multimodal Language Models for Domain-Specific Procedural Video Summarization
Screencast Tutorial Video Understanding
Learning To Recognize Procedural Activities with Distant Supervision
A comprehensive survey of procedural video datasets